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ABSTRACT 

Wavelet- domain  hidden  Markov  models  have  proven  to  be 
useful  tools  for  statistieal  signal  and  image  proeessing.  The 
hidden  Markov  tree  (HMT)  model  eaptures  the  key  features 
of  the  joint  statisties  of  the  wavelet  eoeffieients  of  real-world 
data.  One  potential  drawbaek  to  the  HMT  framework  is  the 
need  for  eomputationally  expensive  iterative  training  (using 
the  EM  algorithm,  for  example).  In  this  paper,  we  propose 
two  redueed-parameter  HMT  models  that  eapture  the  general 
strueture  of  a  broad  elass  of  grayseale  images.  The  image 
HMT  (iHMT)  model  leverages  the  faet  that  for  a  large  elass 
of  images  the  strueture  of  the  HMT  is  self-similar  aeross 
seale.  This  allows  us  to  reduee  the  eomplexity  of  the  iHMT 
to  just  nine  easily  trained  parameters  (independent  of  the 
size  of  the  image  and  the  number  of  wavelet  seales).  In  the 
universal  HMT  (uHMT)  we  take  a  Bayesian  approaeh  and 
fix  these  nine  parameters.  The  uHMT  requires  no  training 
of  any  kind.  While  simple,  we  show  using  a  series  of  im¬ 
age  estimation/ denoising  experiments  that  these  two  new 
models  retain  nearly  all  of  the  key  struetures  modeled  by 
the  full  HMT.  Based  on  these  new  models,  we  develop  a 
shift- invariant  wavelet  denoising  seheme  that  outperforms 
all  algorithms  in  the  eurrent  literature. 

1.  INTRODUCTION 

Statistical  image  processing  problems,  such  as  estimation, 
detection,  and  classification,  rely  on  knowledge  of  the  joint 
probability  density  function  (pdf),  f(x),  of  the  image  x. 
Since  f(x)  is  usually  not  known  or  is  too  complex  to  spec¬ 
ify  exactly,  models  that  accurately  approximate  f{x)  are 
critical  to  image  processing  algorithms. 

There  have  been  several  approaches  to  modeling  the  lo¬ 
cal  joint  statistics  of  image  pixels  in  the  spatial  domain,  the 
Markov  random  field  model  [1]  being  the  most  prevalent. 
However,  spatial-domain  models  are  limited  in  their  ability 
to  describe  large-scale  behavior.  Markov  random  field  mod¬ 
els  can  be  improved  by  incorporating  a  larger  neighborhood 
of  pixels,  but  this  rapidly  increases  their  complexity. 

Transform-domain  models  are  based  on  the  idea  that 
often  a  linear,  invertible  transform  will  “restructure”  the 
image,  leaving  transform  coefficients  whose  structure  is  sim¬ 
pler  to  model.  Real-world  images  are  well  characterized  by 
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their  singularity  (edge  and  ridge)  structure.  The  wavelet 
transform  captures  this  singularity  structure,  and  provides 
a  natural  and  powerful  domain  for  image  modeling  [2].  We 
aim,  therefore,  to  model  the  joint  pdf  of  the  wavelet  coeffi¬ 
cients  w  of  an  image,  and  design  wavelet-domain  processing 
algorithms  based  on  this  model  of  /(w). 

2.  IMAGES  IN  THE  WAVELET  DOMAIN 

The  wavelet  transform  is  an  atomic  decomposition  of  an  im¬ 
age  with  basis  functions  that  are  shifted  and  dilated  versions 
of  an  oscillating  mother  wavelet  [2].  The  primary  proper¬ 
ties  of  wavelet  transforms  make  wavelet-domain  statistical 
image  processing  attractive  [2,  3]: 

PI.  Locality:  Each  wavelet  coefficient  represents  the  im¬ 
age  content  localized  in  spatial  location  and  frequency. 

P2.  Multiresolution:  The  wavelet  transform  represents 
the  image  at  a  nested  set  of  scales. 

P3.  Edge  Detection:  Wavelets  act  as  local  edge  detec¬ 
tors.  The  edges  in  the  image  are  represented  by  large 
wavelet  coefficients  at  the  corresponding  spatial  loca¬ 
tions. 

P4.  Decorrelation:  The  wavelet  coefficients  of  real-world 
images  tend  to  be  approximately  decorrelated. 

P5.  Energy  Compaction:  The  wavelet  transforms  of 
real-world  images  tend  to  be  sparse.  A  wavelet  co¬ 
efficient  is  large  only  if  edges  are  present  within  the 
support  of  the  wavelet. 

Properties  PI  and  P2  lead  to  a  natural  arrangement  of 
the  wavelet  coefficients  in  a  quadtree  structure  with  three 
subbands  representing  the  horizontal,  vertical,  and  diago¬ 
nal  edges  in  the  image  (see  Fig.  1).  The  Compaction  prop¬ 
erty  (P5)  follows  from  the  fact  that  the  edges  constitute 
only  a  very  small  portion  of  a  typical  image;  consequently, 
we  can  closely  approximate  an  image  by  just  a  few  (large) 
wavelet  coefficients.  Furthermore,  the  Decorrelation  prop¬ 
erty  (P4)  indicates  that  the  dependencies  between  wavelet 
coefficients  are  predominantly  local.  The  primary  proper¬ 
ties  give  wavelet  transforms  significant  structure,  which  we 
codify  in  the  following  seeondary  properties: 

51.  NonGaussianity:  The  wavelet  coefficients  have  peaky, 
heavy-tailed  marginal  distributions. 

52.  Persistency:  Large/small  values  of  wavelet  coeffi¬ 
cients  tend  to  propagate  through  the  scales  of  the 
quadtrees. 

NonGaussianity  follows  immediately  from  Energy  Compaction 
(P5).  Persistency  follows  from  the  Edge  Detection  (P3) 
and  Multiresolution  (P2)  properties. 
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In  general,  the  wavelet  coefficients  w  are  indexed  by  two 
integers:  one  for  the  scale  (dilation),  and  one  for  the  shift. 
In  this  paper,  we  will  adopt  an  abstract  indexing  system 
and  use  only  one  integer  whose  value  ranges  from  1  to 
for  an  X  image. 


3.  HIDDEN  MARKOV  TREE  MODELS 


The  secondary  properties  of  wavelet  transforms  give  rise  to 
joint  wavelet  statistics  that  are  succinctly  captured  by  the 
wavelet- domain  hidden  Markov  tree  (HMT)  models  intro¬ 
duced  by  Crouse  et  al.  (see  [4]  for  a  more  detailed  discus¬ 
sion). 

The  HMT  models  the  nonGaussian  marginal  pdf  f{wi) 
(SI)  as  a  Gaussian  mixture  whose  components  are  labeled 
by  a  hidden  state  Si  G  S,  L.  The  Si  dictate  from  which  of 
the  two  components  in  the  mixture  model  Wi  is  drawn,  and 
thus  characterize  (in  the  statistical  sense)  the  magnitude 
of  Wi.  State  S  corresponds  to  a  zero-mean,  low- variance 
Gaussian,  while  state  L  corresponds  to  a  zero-mean,  high- 
variance  Gaussian.  If  we  let 


denote  the  Gaussian  pdf,  then  we  can  write 

f{wi\Si=S)  :=  (2) 

f(wi\Si  =  l)  :=  g(wi-,0,al.i)  (3) 

with  al  >  cr|.  The  marginal  pdf  f{wi)  is  the  convex  com¬ 
bination  of  the  conditional  densities 


f{wi)  =  p^g{wi-,0,ai.i)  +p\g{wi-,0,al.i),  (4) 

with  pf  =  1  —  The  pf  and  can  be  interpreted  as 
the  probability  that  Wi  is  small  or  large  (in  the  statistical 
sense),  respectively. 

The  persistence  of  wavelet  coefficient  magnitudes  across 
scale  (S2)  is  modeled  by  linking  the  hidden  states  in  a 
Markov  tree.  The  resulting  dependency  graph  has  a  quadtree 
topology  that  mirrors  the  quadtree  topology  of  the  wavelet 
coefficients,  see  Fig.  1(b).  Each  subband  is  represented  with 
its  own  quadtree;  this  assumes  that  the  subbands  are  sta¬ 
tistically  independent. 

Each  parent^child  state-to-state  link  has  a  correspond¬ 
ing  transition  matrix  that  quantifies  statistically  the  degree 
of  persistence  of  large/small  coefficients: 


A^ 


(5) 


with  =  1  -  pf^S  ^  ^  _  pL^L 

Denote  the  parameters  needed  to  specify  a  HMT  model 
by  the  vector  ©.  Members  of  ©  are  the  mixture  variances 
for  each  state,  as,i  and  ctl^ ,  the  transition  probabilities  p^^^ 
and  p\^^,  and  a  mass  function  for  the  hidden  state  of  the 
root  node,  po.  These  parameters  can  be  fit  to  a  given  set 
of  training  data  using  the  Expectation-Maximization  (EM) 
algorithm  [4].  The  training  yields  an  approximate  maxi¬ 
mum  likelihood  estimate  of  the  model  parameters  given  the 
training  data,  yielding  a  good  approximation  of  the  joint 


Figure  1:  (a)  Quadtree  organization  of  the  wavelet  coef¬ 
ficients.  The  four  children  wavelet  coefficients  divide  the 
spatial  localization  of  the  parent  coefficient,  (b)  2-D  HMT 
model.  Each  black  node  is  a  wavelet  coefficient;  each  white 
node  is  the  corresponding  hidden  state.  Links  represent 
dependencies  between  states. 


density  function  /(w)  of  the  wavelet  coefficients  and  thus 

fix)- 

In  general,  the  HMT  model  for  a.n  N  x  N  image  has 
approximately  4n  parameters,  with  n  :=  In  some  ap¬ 
plications,  this  large  number  of  parameters  could  make  the 
HMT  model  cumbersome.  To  accurately  specify  4n  param¬ 
eters  for  an  n-pixel  image  requires  significant  a  priori  infor¬ 
mation  about  the  image.  If  this  information  is  unavailable, 
we  run  the  risk  of  over-fitting  the  model.  Crouse  et  al.  [4] 
reduces  the  total  number  of  HMT  parameters  to  approxi¬ 
mately  4L,  with  L  the  number  of  wavelet  scales  (typically 
4-10),  by  tying  within  scale.  While  a  significant  reduction, 
a  large  quantity  of  a  priori  image  information  is  still  re¬ 
quired  to  specify  the  parameters  without  over-fitting. 

Often,  the  a  priori  image  information  takes  the  form  of 
training  data.  Training  algorithms  such  as  the  EM  algo¬ 
rithm,  especially  for  large  data  sets  or  data  that  have  been 
severely  corrupted  by  noise,  can  be  computationally  pro¬ 
hibitive.  This  makes  the  wavelet  HMT  model  impractical 
for  applications  requiring  computationally  efficient  process¬ 
ing.  Furthermore,  in  many  applications  training  data  is 
unavailable.  In  such  cases,  an  empirical  Bayesian  approach 
could  be  taken  and  a  model  fit  to  the  data  at  hand.  This 
is  an  effective  approach  if  processing  time  is  not  an  issue 
(see  denoising  examples  in  Fig.  2).  However,  if  the  ob¬ 
served  data  is  severely  corrupted  (by  noise,  for  example), 
then  training  may  not  be  robust,  and  the  model  parameters 
will  not  characterize  the  joint  image  pdf  accurately. 

To  address  these  problems,  we  will  reduce  the  num¬ 
ber  of  parameters  in  the  HMT  model.  In  doing  this,  the 
HMT  model  will  become  less  accurate;  two  images  that  have 
different  parameterizations  with  the  general  form  of  the 
HMT  may  have  the  same  parameterization  in  the  reduced- 
parameter  model.  What  we  gain  is  a  reduction  in  complex¬ 
ity;  less  a  priori  information  will  be  needed  to  specify  the 
model  parameters,  and  training  will  become  more  robust. 


4.  REDUCED-PARAMETER 
HMT  IMAGE  MODELS 

Crouse  et  al.  assumed  that  every  image  has  a  different  HMT 
model,  with  the  4L  parameters  being  specified  by  training 
on  an  observation  [4].  In  this  section  we  take  a  different 
approach.  We  specify  a  new  HMT  model,  called  the  iHMT^ 
with  a  drastically  reduced  set  of  parameters  (only  9),  that 
incorporates  properties  common  to  all  images  in  a  class. 
The  parameterization  of  the  iHMT  is  based  on  the  fact 
that  for  real-world  images,  the  structure  of  the  HMT  is 
s elf- similar  SiCTOSS  scale  [5]. 

Furthermore,  we  have  found  that  many  real-world  im¬ 
ages  have  similar  iHMT  parameters.  By  fixing  one  set  of  pa¬ 
rameters,  called  the  universal  HMT  (uHMT)^  we  can  take  a 
strictly  Bayesian  approach  to  the  estimation  problem,  elim¬ 
inating  the  need  for  training  altogether. 

4.1.  Tertiary  Properties  of  the  Wavelet  Coefficients 

The  wavelet  transforms  of  real-world  images  exhibit  addi¬ 
tional  strong  statistical  properties  in  addition  to  the  pri¬ 
mary  (P1-P5)  and  the  secondary  (S1,S2)  properties.  In 
designing  our  reduced-parameter  HMT  models,  we  will  lever¬ 
age  the  following  tertiary  properties  of  wavelet  transform: 

Tl.  Exponential  decay  across  scale:  The  magnitudes 
of  the  wavelet  coefficients  of  real-world  images  tend 
to  decay  exponentially  across  scale. 

T2.  Stronger  persistence  at  finer  scales:  The  persis¬ 
tence  of  large/small  wavelet  coefficient  magnitudes 
becomes  stronger  at  finer  scales. 

The  exponential  decay  property  (Tl)  stems  from  the 
overall  smoothness  and  self-similarity  of  images.  Roughly 
speaking,  a  typical  real-world  image  consists  of  smooth  re¬ 
gions  separated  by  a  finite  number  of  discontinuities.  This 
results  in  a  l//-type  spectral  behavior,  which  leads  to  the 
exponential  decay  of  the  wavelet  coefficients  across  scale  [2]. 

We  can  obtain  intuition  behind  property  T2  by  consid¬ 
ering  the  simple  yet  powerful  image  model  of  Cohen  and 
D’Ales  [6].  They  model  an  image  as  piecewise  smooth  with 
a  finite  number  of  discontinuities.  Consider  a  1-D  slice  from 
such  an  image.  Clearly  it  is  also  piecewise  smooth  with  a 
finite  number  (say  M)  of  discontinuities. 

Since  there  are  a  finite  number  of  discontinuities  and 
the  spatial  resolution  of  the  wavelet  coefficients  becomes 
finer  as  j  increases  (P2),  there  is  some  jcrit  such  that  for 
9^11  j  ^  icrit,  each  wavelet  basis  function  has  at  most  one 
discontinuity  inside  its  spatial  support.  We  call  this  condi¬ 
tion  isolation  of  the  edges.  Given  no  a  priori  information 
about  the  locations  of  the  discontinuities,  the  fact  that  the 
spatial  resolutions  of  the  wavelet  coefficients  become  finer 
exponentially  implies  that  the  probability  that  every  edge 
is  isolated  goes  to  1  exponentially. 

By  P4,  for  fine  scales  such  that  j  ^  jcrit  there  will  be  on 
the  order  of  M  wavelet  coefficients  that  are  “large”  when 
compared  to  other  coefficients  at  the  same  scale  (exactly 
M  if  we  are  using  the  Haar  wavelet).  Each  of  these  large 
coefficients  will  also  have  a  large  child,  since  the  children 
wavelet  basis  functions  simply  divide  up  the  spatial  support 
of  the  parent.  Each  of  the  small  coefficients’  children  will 


have  small  children,  since  there  is  no  chance  for  any  of  them 
to  encounter  an  edge. 

In  2-D,  the  situation  is  similar  except  that  instead  of  a 
discontinuities  at  points,  we  now  have  discontinuities  along 
curves.  At  jcrit,  all  wavelet  basis  functions  that  have  spatial 
support  intersecting  this  curve  will  be  “large.”  Again,  each 
of  these  coefficients  will  also  have  at  least  one  large  child, 
while  the  small  coefficients  will  spawn  only  small  children. 


4.2.  The  iHMT  model 

Based  on  the  tertiary  properties  of  the  wavelet  transforms 
of  real-world  images,  we  can  specify  the  HMT  model  param¬ 
eters  in  a  hyper-parametric  form.  The  coefficient  decay  and 
the  change  in  coefficient  persistence  are  easily  modeled  by 
imposing  structure  on  how  the  mixture  variances  and  state 
transition  probabilities  change  across  scale.  Because  the 
tertiary  properties  are  common  to  many  real-world  images, 
the  resulting  model  describes  the  common  overall  behavior 
of  real-world  images  in  the  wavelet  domain. 

We  can  easily  model  the  exponential  decay  of  wavelet 
coefficients  (Tl)  through  the  mixture  variances  of  the  HMT 
model.  Since  the  HMT  mixture  variances  characterize  the 
magnitudes  of  the  wavelet  coefficients,  we  will  require  that 
they  decay  exponentially  across  scale: 

4,  =  (6) 

al,,  =  (7) 

To  have  <  (J^-j  foi*  scales,  we  require  as  >  a\_.  The 
result  is  an  HMT  for  images  with  1//  power  spectra. 

We  will  model  the  change  in  the  degree  of  coefficient 
magnitude  persistency  by  considering  the  way  that  the  state 
transition  probabilities  change  across  scale. 

Again,  consider  a  1-D  signal  consisting  of  smooth  re¬ 
gions  having  M  jump  discontinuities.  The  isolation  of  edges 
at  fine  scales  controls  the  persistency  and  novelty  probabil¬ 
ities  (and  hence  the  form  of  the  transition  matrix)  in  the 
HMT.  If  each  of  the  M  edges  in  the  1-D  slice  is  isolated  then 
there  is  no  opportunity  for  a  novel  large  coefficient  to  come 
from  a  small  parent;  the  only  way  a  coefficient  can  be  large 
is  if  its  parent  is  large.  Thus,  0  exponentially  as 

j  ^  oo.  In  other  words,  1,  since  once  a  wavelet  ba¬ 

sis  function  lies  over  a  smooth  region,  all  of  its  children  also 
lie  over  that  smooth  region.  If  a  basis  function  lies  over  an 
edge,  one  and  only  one  of  its  children  will  lie  over  the  edge. 
This  is  an  exact  statement  for  the  Haar  basis  functions, 
and  a  close  approximation  for  longer  wavelets.  Therefore, 
the  large  wavelet  coefficient  gives  rise  to  one  large  and  one 
small  wavelet  coefficient  and  For  a  more  in- 

depth  analysis,  see  [5] 

The  edge  isolation  probability  going  to  1  exponentially 
means  that  the  asymptotic  values  for  persistency  and  nov¬ 
elty  parameters  are  also  approached  exponentially.  This 
gives  a  state  transition  matrix  (see  (5))  specified  by  four 
parameters: 


1  - 


(8) 


The  only  parameter  in  the  HMT  not  yet  accounted  for 
is  the  probability  mass  function  on  the  hidden  state  value 


of  the  root  coefficients  (just  one  number  in  our  case, 
since  the  hidden  state  can  only  take  two  different  vales). 
Taking  this  parameter  as  is,  we  have  reduced  the  number 
of  parameters  that  specify  the  iHMT  model  to  nine: 

©i  =  {as,  q:l,  C(ts,  7s,  7l,  Css,  C\_\_,p^^ }  .  (9) 

4.3.  A  “universal”  iHMT:  The  uHMT 

Now  that  we  have  an  image  model  specified  by  a  small  set 
of  parameters  ©i,  we  must  find  a  way  of  determining  them. 
The  first  possibility  would  be  to  derive  a  constrained  EM 
algorithm  to  give  pseudo-MLE  estimates  of  ©i  given  train¬ 
ing  data.  Deriving  the  steps  for  this  algorithm  is  difficult, 
and  there  is  no  guarantee  that  the  training  would  be  faster 
than  in  the  unconstrained  case. 

Another  possibility  is  to  fix  the  parameters  directly. 
This  yields  an  iHMT  model  for  a  class  of  images,  with  each 
member  in  the  class  being  treated  as  statistically  equivalent. 
Although  we  clearly  lose  accuracy  by  viewing  all  images  of 
interest  as  statistically  equivalent,  we  totally  eliminate  the 
need  for  training.  This  saves  us  a  tremendous  amount  of 
computation.  For  example,  on  a  512  x  512  image  the  EM 
algorithm  can  take  anywhere  from  minutes  to  hours  to  con¬ 
verge  on  a  typical  workstation. 

To  see  how  much  variation  in  iHMT  parameters  there  is 
across  grayscale,  photograph-like  images,  we  trained  HMT 
models  for  a  set  of  normalized  images  and  examined  their 
parameters.  The  variance  and  persistence  decays  were  mea¬ 
sured  by  fitting  a  line  to  the  log  of  the  variance  vs.  scale 
for  each  state.  The  decays  were  very  similar  for  all  of  the 
images.  Since  the  images  were  normalized,  the  range  over 
which  the  variances  decayed  was  similar  as  well.  These  ob¬ 
servations  lead  us  to  believe  that  a  specific,  “universal”  set 
of  iHMT  parameters  can  reasonably  characterize  photograph¬ 
like  images.  We  call  the  HMT  with  this  set  of  parameters 
the  model. 

The  simplicity  of  the  uHMT  model  also  allows  us  to 
apply  it  in  situations  where  the  cost  of  a  standard  HMT 
would  be  prohibitive.  For  instance,  we  have  developed  a 
fast  O(nlogn)  shift-invariant  estimation  scheme  (discussed 
briefly  in  Section  5  and  in  detail  in  [5])  based  on  the  uHMT 
parameters  that  delivers  state-of-the-art  performance  (see 
Fig.  2). 

5.  APPLICATION  TO  IMAGE  DENOISING 

To  demonstrate  the  effectiveness  of  the  uHMT  for  modeling 
an  image’s  wavelet  coefficients,  we  estimate  an  image  sub¬ 
merged  in  additive  white  Gaussian  noise.  Translated  into 
the  wavelet  domain,  the  problem  is  as  follows: 

given  y  =  w  +  n,  estimate  w,  (10) 

where  n  is  a  Gaussian  random  field  whose  components  are 
independent  and  identically  distributed  with  zero  mean  and 
known  variance  cr^. 

Since  we  are  viewing  w  as  a  realization  of  a  random 
field  whose  joint  pdf  is  modeled  by  the  HMT,  we  take  a 
Bayesian  approach  to  the  estimation  problem.  The  con¬ 
ditional  density  /(y|w)  is  given  by  the  problem;  it  is  an 
independent,  Gaussian  random  field  with  mean  w.  Using 


the  HMT  model  for  /(w),  we  can  solve  the  Bayes  equation 
for  the  posterior  /(w|y). 

To  obtain  the  model  parameters,  Crouse  et  al.  takes 
an  empirical  Bayesian  approach  [4].  The  HMT  parameters 
used  to  model  /(w|©)  are  first  estimated  from  the  observed 
noisy  data  y  and  then  “plugged-in”  to  the  Bayes  equation 
(after  accounting  for  the  noise). 

For  the  Bayes  estimator,  we  calculate  the  conditional 
mean  of  the  posterior  /(w|y,  ©)  using  the  pointwise  trans¬ 
formation 

Wi  =  E[wi|y,  ©]  =  ^p(Si  =  g|y,Q)  27’7  (11) 

—  an  -f-  a  ■ 

q 

to  obtain  the  minimum  mean-square  estimate  (MMSE)  of 
w.  Results  using  the  empirical  Bayesian  HMT  estimator, 
shown  in  Fig.  2(d)  and  Table  1,  are  competitive  in  both 
visual  quality  and  PSNR  to  redundant  wavelet  shrinkage. 

With  the  uHMT  parameters,  we  have  a  prior  on  w  and 
the  estimation  problem  can  be  approached  from  a  purely 
Bayesian  standpoint.  Since  we  have  eliminated  training, 
the  estimation  algorithm  is  truly  0(n)  and  takes  only  a  few 
seconds  to  run  on  a  workstation. 

To  test  this  new  Bayesian  estimator,  we  denoised  a  set 
of  images  using  the  uHMT  with  parameters:  aL  =  ols  = 
5/4,  ^5  =  2^  a,  =  2'3,  7S  =  7l  =  1,  Css  =  Cll  = 
32/5,  and  po  =  1/2-  The  results,  given  in  Table  1  and  Fig. 
2(e),  are  almost  identical  to  the  more  complicated  empirical 
Bayes  HMT  approach,  suggesting  that  we  have  lost  almost 
nothing  by  totally  eliminating  training. 

Image  estimates  obtained  using  an  orthogonal  wavelet 
transform  frequently  exhibit  visual  artifacts,  usually  in  the 
form  of  ringing  around  edges.  These  artifacts  can  be  com¬ 
batted  by  averaging  together  estimates  obtained  from  all 
different  shifts  of  the  image  [7].  The  resulting  shift-invariant 
estimate  is  given  by 

X  =  Average(S_fc,-m(D(Sfc,m(2/))))o<fc,m<w-i  (12) 

where  Sk,m{y)  =  y{s  —  k^t  —  m)  is  the  2-D  shift  opera¬ 
tor  and  D  denotes  the  estimator  (11).  Implementing  (12) 
directly  would  have  computational  complexity  O(n^)  and 
would  thus  be  infeasible  for  large  images.  To  streamline  the 
algorithm,  we  must  exploit  the  redundancies  in  the  wavelet 
representations  between  different  shifts  of  the  image. 

In  the  wavelet  domain,  each  shift  of  the  image  cor¬ 
responds  to  a  different  tree  of  wavelet  coefficients.  The 
wavelet  coefficient  trees  for  different  shifts  overlap,  with 
common  coefficients  occupying  entire  subtrees.  Averag¬ 
ing  estimates  for  different  shifts  amounts  to  averaging  the 
p{Si  =  g|y,  ©)  for  each  tree  in  which  Wi  appears,  and  then 
using  the  result  in  (11)  (we  assume  that  ©  is  the  same  for 
each  shift  of  the  image).  The  way  in  which  the  wavelet  co¬ 
efficient  trees  of  different  shifts  overlap  allows  an  0(n  log  n) 
shift-invariant  denoising  algorithm  [5].  The  results  of  Ta¬ 
ble  1  and  Fig.  2(f)  indicate  that  this  denoising  algorithm 
defines  the  new  state-of-the-art:  in  general,  we  gain  a  1- 
1.5  dB  gain  over  thresholding  with  the  redundant  wavelet 
transform  [7,  8]. 


Figure  2:  (a)  Original  256  x  256  “Boats”  image;  (b) 

Noisy  boats  image,  with  an  =  0.1,  PSNR=20dB.  Boats 
image  denoised  using  (c)  redundant  hard-thresholding 
using  empirical  best  threshold  [8],  PSNR=26.3dB;  (d) 
empirical  Bayesian  HMT  estimator  [4]  PSNR=26.5dB; 
(e)  uHMT  Bayesian  estimator,  PSNR=26.4dB;  (f)  shift- 
invariant  uHMT  estimator,  PSNR=27.4dB. 


6.  CONCLUSIONS 

Hidden  Markov  Trees  capture  the  primary  aspects  of  image 
structure  in  the  wavelet  domain.  In  this  paper,  we  have 
shown  that  additional  image  structure  can  be  exploited  by 
constraining  the  HMT  parameters  to  have  a  certain  form. 
The  resulting  model,  the  iHMT,  has  only  9  parameters. 

A  set  of  “universal”  parameters  arises  naturally  from 
the  form  of  the  iHMT.  These  nine  numbers  completely  spec¬ 
ify  a  model  for  a  large  class  of  real-world  images,  elimi¬ 
nating  any  need  for  training  in  the  estimation  algorithm 
without  compromising  denoising  performance.  Having  the 
model  fully  specified  facilitates  the  implementation  of  a 
shift-invariant  estimation  algorithm  which  offers  state-of- 
the-art  performance. 


Table  1:  Image  estimation  results  for  256  x  256  images  cor¬ 
rupted  with  additive  white  Gaussian  noise  of  an  =  0.1. 
Entries  are  the  the  peak  signal  to  noise  ratio  (PSNR), 
PSNR  :=  — 20 log^gdlx  —  x||2/A^).  Pixel  intensity  vales 
were  normalized  between  0  and  1.  All  results  use  the 
Daubechies-8  wavelet.  “R-HMT”  is  the  shift-invariant  esti¬ 
mator;  “uHMT”  uses  the  “universal”  parameters  presented 
in  Section  5;  “E-HMT”  uses  the  empirical  Bayesian  esti¬ 
mator  of  [4];  “R-Thr”  uses  a  hard  thresholded  redundant 
wavelet  transform  using  the  thresholds  in  [8] 


Image 

R-HMT 

uHMT 

E-HMT 

R-Thr 

Baby 

29.6 

28.9 

29.2 

29.5 

Birthday 

26.4 

25.8 

25.8 

25.3 

Boats 

27.4 

26.4 

26.5 

26.3 

Bridge 

25.3 

24.6 

25.0 

23.7 

Buck 

29.6 

28.4 

28.6 

29.7 

Building 

26.6 

25.9 

26.3 

25.8 

Camera 

27.0 

26.2 

26.4 

26.3 

Clown 

27.8 

26.8 

26.8 

26.5 

Fruit 

29.7 

28.5 

28.6 

29.0 

Kgirl 

29.3 

28.3 

28.3 

28.4 

Lenna 

27.6 

26.7 

26.7 

26.3 
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